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(54) Titie: DIFFERENTLU-LY EXPRESSED GENES IN HEALTHY AND DISEASED SUBJECTS 
(57) Abstract - 



The present invention involves methods and compositions for identifying genes which are differentially expressed in a normal healthy 
animal and an animal having a selected disease or infection, and methods for diagnosing diseases or infections characterized by the presence 
of those genes, despite the absence of imowledge about the gene or its function. Tht methods involve the tuse of a com|x>sition suitable 
for use in hybridization which consists of a solid surface on which is inmiobilized at pre-defined regions thereon a plurality of defined 
oligonucleotide/polynucleotide sequences for hybridization. Each sequence comprises a fragment of an EST: isolated fhnn an identified 
DNA library prepared from tissue or cell samples of a healthy animal, an aiumal with a selected disease or infection, .and any combination 
thereof. Differences in hybridization patterns produced through iisie of this composition and the specified methods enable diagnosis of 
disease based on differential expression of genes of unknown function, and enable the identification of tliose genes and the proteins encoded 
thereby. 
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^ differeihtiially expressed genes in healthy and ciiseased siibjebts 

€!ross Reference toRelated^Applicarionsr- ' ^ ' : : . : ^ . ^v:. i ; 

5 : « . I iThis appUcation i5 'acontinuadon?in-pm appUcaddn of U.S. S 

08/195,485.filed:Fdmiary 14, 1994; the contents of which are inccHporatedlierein by 
lefeiencei ' ly'^i.*--;: ^ytr.u '-^ '^^ 'r^. . . • •■■ ^X.u n^'^'l-. ^.iv-^,-,^ !.» 

Field of the Invisnrion . m--;---:" . i^^i-'-t- -r;.. ;">».;'^. t- :^^ 

10 > yr. H ^-^^ piesCTt -v invention relates to the ''usfe= of inunoWlized 

oligonucleotide/polynucleodde or- polyhucleodde sequences f or ' the identification, 
sequencing and characterization of genes which are implicated in disease, infection, 
or development "and the use of such identified graes and the proteins enodded thereby 
in diagnosis, pn>gnosis, therapy and drug diso)^ i > 

Baclcpronnd of the Invenriori r ^ ; - 

Identification, sequencing 'aiid characterization of genes, especially 
human genes, is a major goal of modem scientific research. By identifying genes, 
determining their sequences and characterizing their biological function, it is pos^ble 

20 to employ lecobinant DNA technology to produce large quantities of valuable "gene 
products", e.g., proteins and peptides. Additionally, knowledge of gene sequences 
can provide a key to diagnosis, prognosis and treatncsent of a variety of disease states 
in plants and arumals which are characterized by ixiapprbpriate expression and/or 
lepxessioh of selected gene(s) or by the influence of external factors, e.g., carcinogens 

'25 or teratogens, on gene function. The term disease-associated genes(s) is used herein 
in its broadest sence to meaii not only genes associated with classical inherited 
diseases; but also those associated with genetic predisposition to disease as well as 
infectious or pathogenic states resulting from gene expression by infectious agents or 
the effect on host cell gene expression by the presence of such a pathogen or its 

30 products Locating disease-associated genes will permit die development of 
diagnostic and prognostic reagents and methods, as well as possible therapeutic 
regimens, and the discovery of new drugs for treating or preventing the occurrence of 
such diseases. ^ 

^ Methods have been described for the identification of certain novel 
35 gene sequences, referred to as Expressed Sequence Tags (EST) [see, e.g., Adams et 
al. Science. 252:1651-1656 fl991V, and International Patent Application No. 
WO93/00353, published January 7, 1993]. Conventially, an EST is a specific cDNA 
polynucleotide sequence, or tag, about 150 to 400 nucleotides in length, derived from 
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a messenger RNA rmolec^ which is a\marker f dz*, and 

txmxponent of, a human gene actually transcribed in vivo. However, as used herdnian 
lEST' also refers to a gehbtmclDNA fragniem* derived ^f^^ organisni, ,such as>a 
<nucr6oi'ganisin,the PNA of which lacks int^ v / : C : v J O ■ ; 

5 : i 1 A Variety of techniques have been described for identifying particular 
>gcne sequences on the basis of their gene products; < rFor?example, several techniques 
are described in* the art [see; elg;, Inieniatibhal Patent Application No. WO91A)7087, 
published May 30, 1991]. AdditionaUy, known meA exist for the an^lification of 
desired sequences [see^: e.g;, .International Patents Application : No. W091/n2Tl, 

ao published November 14, 1991;^ among others]. *r ; . " ^ . - . : : . r r 
V : . - ^ However, at present^ there exist^ no established methods for filling the 
:need in -the art for: methods and reagents i?**ich employ fragments of differentially 
expressed /genes -of knoWn, ^unknown (or previously unrecognized ) functiont or 
(Consequence to |mmde diagnostic and ther^)eutic methods and reagents for diagnosis 

IS ;and treatment of disease or infection, ? which .conditions are characterized by. : such 
genes and gene products. It should be.appredated that it<is the' expression differences 
that, are , diagnostic of the altered state (e.g., predisease, disease, . pathogenic, 
progression or infectious). Such genes associated with the. altered state, are likely to 
be die targets of drug discovery, whether the genes areithe cause or the effect of the 

20 condition, identification of -such genes provides insightrinto:which 'gene expression 
needs to be re-alteied in onier to reestabUshed the healthy state. ^ 

Summary of the Invgntion 

In one aspect, the invention provides methods, for identifying gene(s) 
25 which are differentially expressed, for exanq}le, in a nomial healthy organism and an 
organism having a disease. The method involves producing and comparing 
hybridization patterns formed between samples of expressed mRNA or cDNA 
polynucleotide sequences obtained from either analogous cells, tissues or organs of a 
healthy organism and a diseased orgamsm and a defined set of 
30 oligonucleotide^lynucleotide/polynucleotide sequence probes from either an 
healthy organism or a diseased organism immobilized on a support Those defined 
oligonucleotide^lynucleotide sequences are representative of the total expressed 
genetic component of the cells, tissues, organs or organism as defined the collection 
of partial cDNA sequences (ESTs). The differences between the hybridization 
35 patterns permit identification of those particular EST or gene-specific 
oligonucleotide/polynucleotide sequences associated with differential expression, and 
the identification of the EST permits identification of the clone from which it was 
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4k]ived:and using ordiiiaiy skill further cloning and^ if desired, sequencing of the fiiU- 
l^gth cDNA and genomic counterparty Le:, gene; from which it was obtained. - v 

: - ' f In aiidtfaer aspect^' the invention provide m 
<to those described above, but which permit identification of those gene(s) of a 
5 pathogen which 'are expressed in any biological sample of an infected organism 'based 
on comparative hybridization of RNA/cDNA^amples derived from a healthy versus 
infected organisni; hybridized to an oUgonucleotide/polynucleotide set representative 
of thegenecodingcomplenientofthepathogen-ofinterest^^'V^^ • ^ - i- -i. * 
^ ^ ^ ^ ^ In aiiotiier aspect, the in ventioai' 

10 to tiiose described above, but which permit identification of those ESTs-specific 
oligonucleotide/polynucleotide sequences of host gene(s) which represent genes being 
differeiitially expressed/ dtered in e^iression thb disease state, or iiifection and are 
expiessed in any biological sample of ^ an^^ i^ orgiuusrii b^sed <m comparative 
hybridization of RNA/cDNA ism^les d^Ved 'f^ healthy versus ' infected 

'15 'orgaiiism^bfintcrest ' ■ -^^'''"'^ ^ : • -^^r^.-'^./;-; - ^ ^ y:-. 

' In a further aspect, the methods descnbed above and in detsdl below, 
also provide niethods 'for diagnosis of diseases or infections characterized by 
differentially expressed genes, the expression of wliich has been altered as a result of 
-infection by the pathogen or disease causing agent in question: All identified 

20 differences provide the basis for diagnostic testing be it the altered expression of 
endogenous genes or the patterned expiessioh of the genes of the infecting organism. 
Such patterns of altered eixpression are defined by comparing RNA/cDN A from the 
two states hybridized against a panel of oligdnucleotide^lynucleotides representihg 
the expressed gene conaponent of a cell, tissue, organ or organic as defined by its 

25 collection of ESTs. . : , 

Yet a further aspect of this invention provides a composition suitable 
foriise in hybridization, which comprises a solid surface on which is immobilized at 
pre-defined regions thereon a plurality of defined oligonucleotide/polynucleotide 
sequences for hybridization, each sequence comprising a firagment of an EST isolated 

30 from a cDNA or DNA library prepared from at least one selected tissue or cell 
sample of a healthy (i.e., pre-disease state) animal, at least one analogous sample of 
ah animal having a disease, at least one analogous sample of an animal infected with a 
pathogen or the pathogen itself, or any combination or multiple combinations thereof. 

An additional aspect of the invention provides an isolated gene 

35 sequence which is differentially expressed in a normal healthy animal and an animal 
having a disease, and is identified by the methods above. Similarly, an isolated 
pathogen gene sequence which is expressed in tissue or cell samples of an infected 
animal can be identified by the methods above. 

3 
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Yet another aspect of the invention is that it provides not only a means 
for a static diagnostic but also provides a means for a carrying out thecpiooeduiecova 
time to 'measure disease progression as well las monitoring the ^efBcacy of disease 
treatmrat regimes indudiiig an toxicological effe^ ..i^ ; ; ,r- . : n/: 

5 M Another aspect^of the invention is tan isolated protein ^produced. by 

expression of the gene ;sequences identified <above; Suchrpn)teins^^^^ in 
therapeutic compositions or diagnostic . ccmipositionsv ; or as > ta^ drug 
development^ . ^i:- ij',y-\y. u;,;vh^ ) ; / 

• Vr , r ])Other aspects and advantages of the ^present invention are described 

10 further in the following detailed description.of the preferred embodiments thwof. . 

' V- . ,. : -..-V ' i:-;-, , . ^ r- / ... / ^ 

Detailed Description of the Invention ; ms ,^ f ^ ;e(Ii*;- 

^ i . ,The present . invention meets, the unfulfilled needs in the art by 
providing mediods for the identification , and use. of gene fragments and genes* .even 

15 those of unlqiown full: lengtii sequence, and unknown function, which are 
differentially expressed in a healtiiy animal and in an arumal haying a specific disease 
or infection by use , of ESTs derived from DNA libraries - of healtiiy and/or 
diseased^ected animals. Employing tiie metiiods of tius . invention permits the 
resulting identUication and isolation of such genes by using their corresponding ESTs 

20 and thereby also permits the production of protein ^products encoded by such genes. 
The genes themselves and/or protein products, if desired, may be enq)loyed in ±e 
diagnosis or tiierapy of tiie disease or infection with which the genes are associated 
and in the development of new drugs therefor. 

It has been appreciated that , one or more differentially identified EST 

25 or genejHspecific oligonucleotid^)olynucleotides define a pattern of differentially 
expressed genes diagnostic of a predisease, disease, or infective state. A knowledge of 
the specific biological function of tiie EST is not required only tiiat the ESTs 
identifies a gene or genes whose altered expression is associated reproducibly with 
the predisease, disease or infectious state. The differences permit tiie identification of 

30 gene products altered in their expression by the disease and represent those products 
most likely to be targets of therapeutic intervention. Similarly, the product may be of 
the infecting organism itself and also be an effective target of intervention. 

/. Definitions. 

35 Several words and phrases used throughout this specification are 

defined as follows: 

As used herein, tiie term "gene" refers to tiie genomic nucleotide 
sequence from which a cDNA sequence is derived, which cDNA produces an EST, as 

4 
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described below. The term gene^ classically refers to the genomic sequence, which, 
;iipon p^c)cessing^can phxluce different cDNAs,^ e;g., by spUdng Events;' However, 
for eaise of reading, any full-length counterpart cDNA sequence which gives rise to an 
EST wiU also be referred to by shorthand her^ ^ ^ ^ = : • . 

5 M v . ; • The tenn •'organisni" includes without linu 

animal . ■ ".x^ f^'^ 'V. r-V ^^;V'!; M-^ 

r s . , ; The term **ammal" is used in its' broadest sense to include all members 
of (the animal ^kingdom^* including humans. > It should be understood, however, that 
flccmding to this invention the same species of animal' which provides the biological 
10 sample also is the source of the defined immobilized bligonucle6tide/p61>1iucleotides 
■as defined' below;' - ^o-^-'^-' ^r-:,:::/,^ - .;■!' . xj-;, 

• : ' The term 'pathogen"; is defined herein as any molecUle ot organism 
which is Capable of infecting an' animal or plant and replicating' its nucleic add 
sequences in the cells or tissues of that anihial or plant . Such a'pathogen is generally 

15 associated with a diseaise condition in the infected aimhal or plant. Such pathogens 
may include viruses, which replicate intra* or extra-cellularly, or other organisms, 
such as bacteria, fiingi or parasites, which generally infect tissues or the blood. 
Certain pathogens or microorganisms are known to exist 'in sequential and 
distinguishable stages of development, e.g:, latent stages, infective stages, and stiiges 

20 which cause symptomatic diseases. In these different stages, the pathogens iafe 
antidpated to express differentiaUy certain genes and/or turn on or off host cell gene 
expression. 

As used herein, the term "disease** or ''disease state" refers to any 
condition which deviates 'fioin a nonnal or standardized healthy state in an orgainism 

25 of the same species in terms of differential expression of the organism's genes. In 
other words, a disease state can 'be any illness or disorder be it of genetic br 
environmental origin , for exanq)le, an inherited disorder such as certain breast 
cancers, or a disorder which is characterized by expression of gene(s) normally in an 
inactive, 'turned off state in a healthy animal, or a disorder which is characterized by 

30 under-expression or no expression of gene(s) which is normally activated or 'turned 
on' in a normal healthy animal. Such differential expression of genes may also be 
detected in a condition caused by infection, inflanmiation, or allergy, a condition 
caused by development or aging of the animal, a condition caused by administration 
of a drug or exposure of the animal to another agent, e.g., nutrition, which affects 

35 gene expression. Essentially, the methods described herein can be adapted to detect 
differential gene expression resulting fiom any cause, by manipulation of the defined 
oligonucleotide^lynucleotides and the samples tested as described below. The 
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concept of disease or disease state also includes jits itemporal. aspects/ in* 
pxogiession and. treatment. - s. ': - ..* .r'^o • on, w; . j-^u u.;tv t^^ 

' The. phrase "diffcrentiaUy;.expresse^^ those situations ein^ 

which a gene transcript is found in differii^ numbers of copies* or ,in .activated ys ^ 
5 y inactivated states, in different ceU types or tissue types of an organismit having i a 
selected dibeasie as contrasted to the levels of the gene transcript found in the same - 
cells or tissues of aohealthy. organism. Genes D3ay> differentially expressed : in - 
differing states of activation inimiciooiganisms or pathogens in different stages of , 
development For*exanq)le; multiple copies of >gene transcripts may be found in an? ; 

10 '.: organism having a selected disease, while only one, or significantiy fewer copies, of 
the same gene transcript are found in a healthy organism, or, vice-versa.! / 

^ As used herein, the term "solid supixirt** refnrs to any known substrate.; 
which i .iSv : /usiefiili ^ for sthe , immohilizationiT^ of rlarge.., numbers; of 
oUgonuclTOtidQ/^lyn^ sequences; by : any available . 'method to -.enable 

15 ;< detectable hyhridizatipn of the inunphilized oUgonucleotide^^ynucleptide sequences , 
with other polynucleotide s^uences in a sample. Among a numb^ of available soli^j 
supports, one desirable exaniple. is the supports described in International Patent. 
Application No. WO91A)7087, published May 30, 199L Also useful arc si^xwts such 
as but not limited to nitrocellulose, mylein, glass, silica ans Pall Biodyne C® It is 

20 also anticipated that improvements yet to be.made to conventional solid supports may > 
also be CTiployed in this invention. 

; The. term "surface" means any generally two-dimensional structure on,{ 
a solid support to which ;the , desired pligonucleotidei^lynucleotide sequence i^ 
attached or immobilized. A surface may have steps* ridges, kinks, terraces and the, 

25 like. 

As used herein, the term "predefined region" refers to a localized area ; 
on a surface of a solid support on which is immobilized one or multiple copies of a 
particular oligonucleotide/polynucleotide sequence and which enables the 
idratification of the oligonucleotide/polynucleotide at the position, if hybridization of 
30 that oligonucleotide/polynucleotide to a sample polynucleotide occurs. 

By "immobilized" refers to the attachment of the 
oligonucleotide/polynucleotide to the solid support Means of immobilization are 
known and conventional to those of skill in the art, and may depend on the type of 
support being used. 

35 By "EST" or "Expressed Sequence Tag" is meant a partial DNA or 

cDNA sequence of about 150 to 500, more preferably about 300, sequential 
nucleotides of a longer sequence obtained from a genomic or cDNA library prepared 
from a selected cell, cell type, tissue or tissue type, organ or organism which longer 

6 
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sequence ocnresponds to an mRNA of^a ^gene found Jii that library. An EST is 
generally DNA. ' One%Qr ' more Ufa^ made from ' a angle tissue, type typically 
provide at ieast about 3000 different (i.e,, -unique) ESTsi. arid: potentiaUy full- 
compleiiient of all possible ESTs iepiesenting all cDNAs e;g^ 50,000^100,000 dman 

5 animal such.as a human. LFurdier background and infcmnaticxi on the: construction of 
ESTs is describea^in M. D/ Adams et al.s Science: ■ 252 :1651^1656 f 19^ and 
International AppUcation Number PGT/US92A)522^ : , . .v . , ^ 

r c ' r - As used herein; the ^tenn '^defined v oligonucleotide/polynucleotide 
sequence" refers to a known nucleotide sequence fragment of a selected EST or gene. 

0 This term is used interchangeably witii the term "fragmerits^of EST*. i.These 
sequential sequences' are gcncraUy comprised of between about 15 to about 45 
nucleotides and more preferably between about 20 to about 25 nucleotides in length: . 
Thus any single EST of 300 nucleotides in length may provide about 280 different^ 
defined bligoiiucleotid^pblynucleotide siequences Of 20 nucleotides in length (e;g:, ^ 

5 ' 20^mers). The lengtiis of the defined oligonucleotide/pblynucleotides liiay be readily 
increased or decreased as desired or needed, depending on the linfiitatibns^ 'o^^ solid * 
siqjport on which they may be immobilized or the requirements of the hylnidization 
conditions 'to be employ ed.The length is generally guided by the principle that it 
should be of sufficient length to insure that it is one average only represented once in 

} the population ' to be examined; Generally, these defined 

oligonucleotid£;^lynucleotides are RNA or DN A and are preferably derived from 
the anti-sense strand of the EST sequence from a corresponding mRNA sequence 
to enable their hybridization witii samples of RNA or DNA. Modified nucleotides ^ 
may be incoiponited to increase stabitiQr and hybridization properties^ 

5 ' By die term "plurality of defined ' bligonucleotide/polynucledtide 

sequences" is meant die following. A surface of a solid support may inmiobili^ a 
large number of "defined oligonucleotide/polynucleotides". For example, depending 
upon the nature of the surface, it can immobilize from about 300 to upwards of 
60;000 defined 20-mcr oligonucleotide^lynucleotides. It is anticipated that future 

) inqnovements to solid surfaces will permit considerably larger such pluralities to be 
immobilized on a single surface. A "plurality" of sequences refers to the use on any 
one solid suppon of multiple different defined oligonucleotide^lynucleotides froin a 
single EST from a selected library, as well as multiple different defined 
oligonucleotide/polyhucleotides from different ESTs from the same library or many 

5 libraries from the same or different tissues, and may also include multiple identical 
copies of defined oligonucleotide^lynucleotides. Ultimately a pluarality has at least 
one oligonucleotide/polynucleotide per expressed gene in the entire organism For 
example, from a library producing about 5,000-10,000 ESTs, a single support can 

7 
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include at least about .1-20 defined pligonucleotide^lynQcleotides rqnesenting, every. 
EST 4n . that :libraiyv- The composition of , defined qUgonucleotide^lynucleptide^^ 
wMch iiuike'iq>i aisurfacpta^ invention niay be selected or designed as 

desired. -..^a* ^yy,^v'> ^r/. .tt*--. v.*-' Tvc"; r^-^"5'o> 

5:^ ^ • / wThe itearm "saniplel' isenaployed in the ^description of this. inyentipn in r 
several important ways. ;As used herein; die term "sample" encompasses any cell oT;. 
tissueirom an organism.. ; Any. desired cell or^tissue type in any desired statemay be . 
selected: to form ;a sanq)le« rFQr^^tample,^the san^ desired^may be.a faiunan T 
cell; the desired cell type for- us^ in^ this invention may be acquiescent J cell or an, 

lO^. activated^ cell. ..^.i. ^ r-, s^^^f:,^.; ^mc^ ; 

' " ; By the phrase "analogous vsample" or "andogous .celL; or, 
nieant that according 4o this invention,. .when the; ESTs/wUch:pro[vid^ 
oUgonucleotide/polynucleotides are produced fionci a cDI^ Ubraiy, prepared fiqm^a;, 
single tissue or cell type source.sample, e.g., liyo: tissue of a human, then the sanspleS;,] 

15 r used to hybridize to those imniobilized defined oUgonucleptide/polynucleotides are,^ ; 
preferably provided by the same type of sample fiom either a healthy or diseased 
animal, i.e., liver tissue of a healthy human and livo* tissue of a diseased or> infected 
human or from a human suspected of having that disease or infection. iUternatiydiy, , 
if the surface contains defined pligonucleotide^lynudeotides £r^ multiple cells qr.^ 

20 tissues, then -the "saixq>les" which are hybridized thereto can be but are not limited to 
samples obtained fiom analogous multiple tissues or cells. : ^ , ,^ 

, By the term "deteptably hybridizing" means that the sample fioin the - 
healthy organism or diseased or infected organism is contacted with the defined , 
oligonucleptide^lynucleotides on the surface for sufficient time to permit the . 

25 formation of patterns, of hybridization on the surfaces caused by hybridization 
between certain polynucleotide sequences in the samples with the certain immobilized., 
defined oligonucleotide^lynucleotides. These pattems are made detectable by the 
use of available conventional techniques, such as fluorescent labelling of the samples. . 
Preferably hybridization takes place under stringent conditions, e.g., revealing 

30 honK)logies of about 95%. However, if desired, other less stringent conditions may ; 
be selected. Techniques and conditions for hybridization at selected stringencies are 

1 

well known in die art [see, e.g,, Sambrook et al. Molecular aonin p. A Laboratorv 
Manual., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY ( 1 989)] . 

35 //. Compositions of The Invention 

The present invention is based upon the use of ESTs from any desired 
cell or tissue in known technologies for oligonucleotide/polynucleotide hybridization. 
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35 



A. 



ESTs 



^ V An EST, as defined above, is for an aniihal; a sequence from .a 
cDNA clone that corresponds to an mRNA. The EST sequences useful in die piesent 
invention arc isolated preferably^ ftom cDNA lihraiies using a rapid screening and 
sequencing ^technique. Gustoni .made cDNA , libraries: arc made lusing Joiown 
techniques. See, generally, .Samteook et al, cited above. Briefly,. mRNA ftom a. 
selected cell or tissue is rcverse transcribed into conq>lementaiy. I)N A (cDNA) using 
the reverse transcriptase enzyme and noade 'double-stnmded tising^^^^^ 
with DNA polymerase or reverse transcriptase. Restriction enzyme sites arc added to 
the cDNA and it is cloned into a vector. The rcsult is axDNATBhrary, : Alters 
coimnacially available cDNA libraries may be used. ^Libraries of cDNAican also be 
genenued from rccombinant expression o£:gehon3ic DNA using known (techniques^ 
including polymeraise chain leacticm-derived techniques. 



ESTs (S^ch^ can range firomi about 150 to about 500 nucleotides in 
15 length, preferably about 300 nucleotides)' can be obtained tirough sequence a^^ 
from either end of die cDNA insert Desirably, die DNA libraries tsed to obtain 
ESTs use directional cloning medKxis so that eidier the' 5*' end of the cDNA Gikely to 
cohtain coding sequence) or the 3' end (likely to be a non-coding sequence) can be 
selectively obtained. ' ^ ^ - ■ L . ■ ' ■ j - : ' ; - 

In general; die method for obtaining ESTs cbihpriseis applying 
conventional automated DNA sequencing technology 'to screen * clones, 
advantageously randomly selected clones, from a cDNA library. The cDNA libraries 
from the desired tissue can be preprooessed, or edited, by conventional techniques to 
reduce rcpeated sequencing of high and intermediate abundance clcHies and to 
maximize the chances of finding rare messages from specific cell populations. 
Preferably, preprocessing includes die use of defined con^sition prescreening 
probes, e.g., cDNA coircsponding to nntochondria, abundant sequences, ribosomes, 
actins, myelin basic polypeptides, or any other known high abundance peptide. These 
prescreening probes used for preprocessing arc generally derived from known ESTs. 
Other useful prcprocessing techniques include subtraction hybridization, which 
preferentially reduces tiie population of highly represented sequences in the library 
[e g., see Fargnoli et al. Anal. Biochem.: 122:364 (1990)] and normalization, which 
results in all sequences being represented in approximately equal proportions in the 
Uhrary [Patanjali et al, Proc. Natl. Acad. Sci. USA. Sg:1943 (1991)]. Additional 
prescreening/differential screening approaches are known to those skilled in the art, 

ESTs can then be generated fitim partial DNA sequencing of the 
selected clones. The ESTs useful in the present invention are preferably generated 
using low redundancy of sequencing, typically a single sequencing reaction. While 
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single sequencing reactions may have , an accuracy vas low. as 90%, this' nevertheless 
provides sufficienti fidelity for identification of ^ the. sequence ^.a^^^ 

primers.'^,; .;gy ; , < ■ O'. . Hr- " f"v;Y t — )f y^'rc.\':>:' 

v> r V . . : If.desiied, the location of an.-EST in afiiU^lragth.cDNAis determined^ 
5 by analyzing the £SX for. the presence ofcdding sequrace. « A convmtional computer 
program is used to predict the extent and orientation of the. coding region of ua^ 
sequence (usingi^^six reading frames).' Based on tius.^^ possible to 

infer the presence of start or stop icodcms within a sequmcfe and .whether the sequence 
iscompletely^coding or completely non-coding:or a^^^^ of the twp. If^start 

10 or stop codons are present, then the EST can cover both part of the 5*-untranslated or 
3-untranslated part of .the; mRNA ? (respectively), as wellras.:part of . the coding 
sequence.. If no coding sequence is present, it is likely that the EST is derived, from 
thev3* untranslated sequence due toits longer; length and the fact^^A^ 
Ubrary: construction; inethods are biased toward die 3' end of die mRNA. , It.should be 

IS understood that both coding and non^cpding regions may proyide ESTs equally useful 

in the described inveiition. ,, v v : n . . : r * v . v . . 

A number of specific ESTs suitable for use in the present 
invention are described above Adams et al (suiz^, which may be incorporated by 
reference herein, to describe ncm-essential iBxamples of desirable ESTs. .Other ESTs 

20 exist in the art which may also be useful in this invention, as will ESTs yet to. be* 
developed by these known techniques. , = 

fl.. Preparing the SpM Suppon of the Imention , 

piigqnudeptide sequences which are. fragments of defined 
sequrace are derived from each EST by conventional means, eg., conventional 

25 chemical synthesis , , or recombinant techniques. , defined 

oligonucleotide^lynucleotide, sequence as described above is a fragment, can be, but 
is not necessarily an anti*sense fragment, of an EST isolated from a DNA library 
prepared from a selected cell or tissue type from a selected animal. For use in the 
present invention, it is presenUy preferred that the defined 

30 oligonucleotid^pqlynucleotide sequences are 20-25mers. As described above, for 
each EST a number of such 20-25mers may be generated. The lengths may vary as 
described above as well as the composition. For example ; 
oligonucleotide/polynucleotides can be modified based on the Oligo 4.0 or simiolar 
programs to predict hybridization potential or to include modifieid nucleotides for the 

35 reasons given above. It is alos appreciated that large DNA segments may be 
employed including entire ESTs or even full length genes particular when inserted 
into cloning vectors. 
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; ' * ' ^ A 'pluraUty 'of these* def^ 

sequences are then attached to a ^ected solid support donvendonally used for the 
:~ iattachnient of - nucleodde sequent In cohbrast^to cither 

; m technologies available in 'the art, tMs^^^s^ defined* not 

5 random, oligonucleotide/polynucleotide sequencesi The EST fragmentsf; or defined 
i oligonucleodde^lynucleodde sequences, immobilized ^on the solid suppcnt ^can 

include firagmehts of one or more ESTs from a library of at:least one selected dssue 
i >: or cell sample of a healthy animal,^ at least one analogous sample of die anixnal baving 
: . a disease, at^least^onte analogous sanq)le ctf the aninial infected with ai padibgen,'^ 
\ 10 any coinbinadon thereof > ^ . . i = '^^ ' ; - C 

; • Numerous cohvendohal methods are eniploycsd fofi attadiing 

i biological molecules such as oligonucleotide^lynucleotide sequences to surfaces of 

a variety of soUd supports. Sec. e.gi, Affinitv Techniques, Enzvme Piirificftrion: Pan 
i B. Methckis in Enzvmologv, Vol. 34, ed. W^B: Jakoby, M. Wilcheik, Acad: Press, 
IS NY (1974): hnnioM^ Biochemicals and Affinitv Oirohiatop^phv^^ in 

Experimental Medicine and Biolo^: vol: 42; ed. R. Dunlap,' Plenum Press, NY 

(1974); U. S. Patent No. 4,762,881; U. S. Patent No. 4,542,102; European Patent 

PubUcation No, 391.608 (October 10, 1990); U. S. Patent No. 4,992,127 (Nov! 21, 
- 1989):-'': ^ -.■''*-*%■ 

20 One ' desirable method for attaching 

oligonucleotide^lynucleotide sequences derived from ESTs to a solid siippon is 
described in International Applicatibh No. PCTAJS90/06607 (published May 30, 
1991). Briefly, this method involves 'forming predefined regions on a surface of a 
solidsupport, where the predefined regions are capable of immobilizing ESTs. The 
' 25 methods make use of binding substance attached to the surface which enable 
selective actiVadoh of the predefined regions. Upon activation, these binding 
substances become enable of binding and inmiobilizing 
oligonucleotide/polynucleotides based on EST or longer gene sequences. 

Any of the known solid substrates suitable for binding 

30 oligonucleotide/polynucleotides at pre-defined regions on the surface thereof for 
hybridization and oiethods for attaching the oligonucleotide/polynucleotides thereto 
may be employed by one of skill in the art according to diis invention. Similarly, 
known conventional methods for making hybridization of the immobilized 
oligonucleotide/polynucleotides detectable, e.g., fluorescence, radioactivity, 

35 photoactivation, biotinylation, solid state circuitry, and the like may be used in this 
invention. 

Thus, by resorting to known techniques, the invention provides 
a composition suitable for use in hybridization which consists of a surface of a solid 
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support oh which is immobilized at pre-defined regions on said surf ace a plurality of 
' 1 defined oligonucleotide^lynucleotide sequences for .hyfaridizatioh. : .For example, 
0, one composition of.this invention is a solid suppon: on wMch are iounobilizied 

of KT fiagnients from a-Uhrar^ froih aiongle cellrtypeiCg.,>*a human 

5 . , stem cell, or a single tissue, e.g., hunian Uver, from ajhedthy ^lulman^: StiU anoA^ 
V composition of this invention is another solid support on which are immobilized 
.idUgos oft EST fragments from a Ubraiycrastnu:ted' from a single cell'type or a tissue 
r. from a human haying a sdected disease or^-predispositon f to a selected disease, c;g., 
;>r:liver'cancer.:. n"-*^ilv'. ror-K. up.; ixc^Hi^i.:-' atv 

' J : . i : r . r . AuotheT embodiment of die com]x>sitions of this^^^i^^ 

. include a single solid support having, oligonucleotides ofiESTs from both single cell 
r or single tissuei Kbraries from both a healthy and diseased human, StiU other 

embodiments include a single support on which are iimnobilized oligos of EST 

firagments from more than one tissue or cell library from a healthy human or a single 
15 support on which are immobilized more than one tissue or ceU, library, from? both 

healthy and diseased animals or hunians. MA preferred compoa^ 
^ anticipated to be a single support cfontaining oligos of ESTs for all. known ceUs and 

tissues from a selected organisnL . 



20 ///. The Methods of the Invention - ^ ; : \ 

A, IdentfficationofGenes , > 

The present invention employs the compositions described 
above in metiiods far identifying genes which are differentially expressed in a normal 
hedtiiy organism and an organism having a disease or infection. These methods may 

25 be employed to detect such genes, regardless of die state of knowledge abcwit die 
function of die gene. The naetfiod of diis inventira by use of die compositions 
containing multiple defined EST fragments from a single gene as described above is 
able to detect levels of expression of genes or in odier cases simply die expression, or 
lack diereof, which differ between normal, healdiy organisms and organisms having a 

30 selected disease, disorder or infection. 

One such method employs a first surface of a solid support on 
which is immobilized at pre-defined regions thereon a plurality of defined 
oligonucleotide/polynucleotide sequences, described above, of ESTor longer gene 
fragment isolated from a cDNA library prepared from at least one selected tissue or 
35 ceU sample of a healdiy animal (die "healdiy test surface") and a second such surface 
on which is inmiobilized at pre-defined regions a plurality of defined 
oUgonucleotide/polynucleotide sequences of ESTor longer gene fragment isolated 
from at least one analogous tissue of an animal having a selected disease (the "disease 
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^Jx^t surface'^, lliese t^^^^ be standaxdized; for i the selected animal or 

rr^sdected .ceU or/ tissue sanq)!^ £rom timt\ animal (ix^ th^^are prescrerabd- for 
.polymoiphisniS'in the speciespopulation);^ 

/ Pplynncleolide sequences are then isolated from niRN A and/or 
; 5 ^ ;cDNA from a biolc^cal sample from a known healtlr^^ animal ("healthy^ control") and 
;,:a'Secondsaniple;iS} similarly prqiaredfr^ from* a 'known diseased animal 

, ("disease. sample!*). Tliese two sanq>les areide^rably selectedrftom the cell or tissue 
/analogous to that which provided: the iimmotnlized oligdnuclebtide^lynttcleotides; 
: : V : . Accordingr.tdii the^. m^^ healthy control sample is 

xlO contacted with one 'set of the iiealthy lest smfape and the disease test surface 
described above for/a time sufficient to permit ^detectable '^hybridization; to occur 
. between the sample and the immobilized defined oUgonucleotide^lynucleotides on 
each surface. The results of this hybridization are a first hybridization pattern formed 
between the nucleotides of healthy control and the healthy test surface and a second 
15 ^ hybridization pattern fonned between the nucleotides of healthy control sample and 
.the disease test-surface. --^^i^. ^> .-^ • r ?i ■ :^v:-m ^ [.r:.::[V\ 

^ In a similar maimer; the disease sample is detectably hybridized 
to another set of healtiiy test and disease' test surfaces, fomiing a thud hybridization 
pattern between the disease sample and healthy test surface and a fourth hybridization 
20 pattern between the disease sample and the disease test- surface. ^ 

Comparing the four hybridization patterns permits detection of 
those defined oligonucleotide^lynucleotides which art differentially expressed 
between die healthy control and the disease sample by the pies^ce of differences in 
the ' hybridization patterns^ at pre-defined regions. The 

25 oligonucleotide^lynucleotides on each surface v;4iich correspond to the pattern 
differences may be readily identified with the corresponding ESTor longer gene 
firagrxient from which the oligonucleotide/polynucleotides are obtained* 

In another embodiment of the method of this invention, the 
same process is employed, with the exception that plurality of defined 

30 oligonucleotide/polynucleotide sequences forming the healthy test sample and the 
disease test sample surfaces are immobilized on a single solid support For example, 
each fragment of an EST or longer gene fragment on the surface is isolated from at 
least two cDNA libraries prepared from a selected cell or tissue sample of a healthy 
aiumal and an analogous selected cell or tissue sample of an animal having a disease. 

35 According to this embodiment, the healthy control sample is 

detectably hybridized to a copy of this single solid surface, forming one hybridization 
pattern with oligonucleotide^lynucleotides associated with both the healthy and 
diseased animal. Similarly, the disease sample is detectably hybridized to a second 
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copy of this single solid surface, ,oionhing one hyhridizarion pattern with 
oligonucleotide^lynucleotides associated.with both the ;hcalthy.OTd.^iiseased animal; 
h. V ; v; . ^: oGomparing the two bybridizadon patterns permits deteptiOTi of 
those defined oIigonucleotide;^lynucleotides which are - d^ expiessed 

:5 i between the healthy control and the disease sample by die presence, of differences ;in 
the., hybridization - pattcmsr r.vat ■ i.pre-dcfined: ^ regions. : ^ •:^-}i^.-'Tht 
■> oligonucleotide/polynucleotides on each surface which.; correspond to the pattern 
.differences may :be readily identified with the corresponding ESTor longer gene 
fragment from which the oUgpnucleoride^ly^ \ ^j,ryi,r \, 

40 : 1; ; , The ja one or more.ESTs as the source. of the 

defined oUgonucleotide/polynudeotide which produced^ 'Jdiffeiencc^ u in 
.hybridization patterns according to these methods permits ready identificationsof :the 
;gene from which those/ESTsvwcre derived. Because oligcmuleotidcs are of sufficient 
length.that they will hybridize under stringent conditions only with a RNA/cDNA for 

15 that gene to which they correspond, the pligo can be used to identify the EST and in 
turn the clone from which it was .derived and by subsequent , cloning, obtain the 
sequence of the fuUrlength cDNA and its genomic counterparts, i.e., the gene, from 
which it was obtained. 

,> In othor wcxds, the ESTs identified, by the mediod of this 
20 invention can be employed to determine the complete sequence of the mRNA, in the 
form of transcribed cDNA, by :using,the EST as a probe to identify a cDNA d^e 
corresponding to a full-length transcript, followed by sequencing of . that clone. :The 
EST or the full length cDNA clone can also be used as a probe to idwirify a genomic 
clone or clones diat contain the conq)lete gene including regulatory and promoter 
,25 regions, exons,: and introns. 

It should be appreciated that one does not have to be restricted 
m using ESTs from a particular tissue from which probe RNA or cDNA is obtained, 
rather any or all ESTs (known or unknown) may be placed on the suppprt 
Hybridization will be used a form diagnostic patterns or to identifiy which particular 
30 EST is detected. For example, all known ESTs from an organism arc used to produce 
a "master" solid support to which control sample and disease samples are alternately 
hybridized. One then detects a pattern of hybridization associated with the particular 
disaease, state which then forms the basis of a diagnostic test or the isolation of 
disease specific ESTs from which die intact gene may be cloned and sequenced 
35 leading uiltimately to a defined therapuetic target. 

Methods for obtaining complete gene sequences from ESTs arc 
well-known to tiiose of skill in die art. See, generally, Sambrook et al, cited above. 
Briefly, one suitable method involves purifying the DNA from the clone that was 
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sequdhc^l to give the EST aiid labe^ Suitable labeling ' 

systems are weU khb those of skill in the art [see, eg. Basic Methods in 
Molecular Biology, L. G. Davis et al; ei) Hsevier Prcss;^ labclea^"-^ 
EST insert is 'ihen used as a probe to screen^'a Ijmbda^p^^ lihrtiiy br 'a \ 

5 - plasmid cDNA Ubrary; identifying colonies c^^ 

cDNA which can be purified by knoWn methods. The* e die newly purified'^ 
clones kre then sequencied to identify fuU length sequenced and coiiq)lete sequencing 
of full length clones is perfori^ digestion or primer' walking. A^^ 

similar screening and clcme selection approach^ can' be applied clones' from a'- 

10^' genomic DNA-Ubrary:* ' " - ' ^ '-'^ ' --"^ - 

- ' : * AdditionaHy/ an EST or^ geiie^^M^^ 
associated with inherited' disorders^ c^^ what stage during 

embryonic development die selected gme from wAich it^is derived is develojped by 
screening embryonic DNA libraries from various stages' of developrhent, e:g. 2^11, ' 

15 8-cell, etc., for die selected gene. As has been mentioned above, tiie iiiventiion may ' 
be Sailed m addtional tempora l niodes for nionitoring the progression of a disease *' 
state, the efBcacy of a particular treamieht' modality or the aging process of ah 

* * * . * 

individual. ' ' ^ 

' Thus, die methods of this invention permit die id^tification/^ 

20 isolation and sequencing of a gene which is differentially e^qnessed ih a selected 
diseascAnfection. As described in more detail below, the identified igene may then be 
employed to obtain any protein encoded diereby, or may be employed as a target for 
diagnostic methods or therapeutic q^proaches to the treatmeiit ' of the di^ease^ ' 
induding, e.gM drug developmeiit ' " ' 

25 The same methods as described above for the identificatioh of 

genes, including genes of unknown function, which are differentially exjnessed in a 
disease state, may also be employed to identify other genes of interest. For example, ' 
another embodiment of this invention includes a method for identifying a gene of a * 
pathogen which is expressed in a biological sample of an animal infected with that 

30 pathogen or the gene of the host which is altered in its expression as a result of the 
infection. 

One such method employs a healthy test surface as described 
above, employing defined oligonucleotide^lynucleotides from a sample of a 
healthy, uninfected aninoal. The second such surface has immobilized at pre-defined 
35 regions thereon a plurality of defined oligonucleotide^lynucleotide sequences of 
ESTs isolated from at least one analogous tissue or cell sample of an infected animal 
(the "infection test surface"). Polynucleotide sequences are isolated from a biological 
sample from a healthy animal ("healthy control") and a second sample is similarly 
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prepared from an; animal infected with ;tfae selected pathogen ^(Vii^ 
These two samples are desirably. elected ;froxn the cell oritissie analogous tq ^Aat 
which provided: the inundbiUzed oligonucleotide^lynucle^^ also .be 

possible to provide maniples £rom the nucleic add of the path^^ ; . r } : . : 

U u According 'to thevmethod the healthy ;Control sample \is 
contacted with one set of the healthy ^^test surface , and the^infection test surface 
described above foru a time sufficient to permit detectable hybridization,, to ^occur 
betwiBch the sanople and the immobilized defined' oligonucleotide/polynucleotides on 
each surface. The results of this hybridization are a first hybridization pattern formed 
between the nucleotides of healthy control and the healthy test surface and a.second 
hybridization pattern formed between the nucleotides of healthy control sample and 
theiiifcbtidn:test sui€ace,*r.^^i" -'-v^ -t-' '-ri: ..r-.i ^v-: .< . v 



: ^ fa^^^^^ the ^infection, sample, > is detectably 

hybridized to another set of healthy test and infection test surfaces* forming a third 
hybridization pattern between the infection sample and healthy test surface and a 
fourth hybridization pattern between the infection sample and the infection test 
surface. 

Comparing the four hybridization patterns permits detection of 
those defined oligonucleotide^lynucleotides which are differentially expressed 
between the healthy animal and the animal infected with the pathogen by the presence 
of differences in the hybridization patterns at pre-defined regions. As mentioned 
differential expression is not required and simple qualitative analysis is possible by 
reference to gene expression which is simply present or absent 

A second embodiment of this method parallels the second 
embodiment of the method as applied to disease above, i.e., the same process is 
employed, with the exception that plurality of defined oligonucleotide^lynucleotide 
sequences forming the healthy test sample surface and the infection test sample 
surface are immobilized on a single solid support. The resulting first hybridization 
pattern (healthy control sample with healthyAnfection test sample) and second 
hybridization pattern (infection sample with healthy/infection test sample) permits 
detection of those defined oligonucleotide^lynucleotides which are differentially 
expressed between the healthy control and the infection sanq>le by the presence of 
differences in the hybridization pauems at pre-defined regions. The 
oligonucleotide/polynucleotides on each siurface which correspond to the pattern 
differences may be readily identified with the corresponding ESTs from which the 
oligonucleotide/polynucleotides are obtained. 

As described above for the methods for identifying differential 
gene expression between diseased and healthy animals, the 
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6Ug6nucleotide/ix>lyhuc1eotides on each surface which conespond to the pattern 
differences niay be xeadily identified with the conesponding ESTs firom which the 
oUgonucleorid^olyrinclebtide sequences, a^ and itic^tiicshispr^ 

pathogen identified for simila^^^ of these methods may 

5 be developed with tesort to the ' teaching herein^ by altering die samples which provide 
the defined oligonucleotide^lyhudeoddes. *F6rexanr^te/ah EST, identified with a 
diffeicndally ex^nessed grae >by ^the method of this invehdbh -is also useftil" in 
detecting genes expiessed 'in ^the Various stages of an pathogen's devdof^zxubnt, 
partictilarly the infective stage* and^fbUbwing the cburs of ^ drug' treacodent and 
10 emergrace of resistant variants; R>f exanq)le,-emplbying'tl)e techniques described 
above, the EST can be used for detecting a gene i^^^ s^ges'of the |^ia^ 

Plasmodium sfieciets- life cycle; ' which ^ m^^ 

gametocyte stages/ c^'C^ • i^--, r..;x.;*-.^-or r .^.i^-v o^^ t. i;:sy, 

15 i ■ - - - ^ addition W W bf thS^ m^ of thi^ 

invention for idratifyihg differentiaUy expressed enibbdiment of this 

invention provides diagnostic methods for diagnosing a sdected' disease stateV o^^^ 
selected state resulting from aging, 'exposure to drugs or infection in an aiiiihal 
According to 'tius aspect of die invention, a &st surface, desert as the healthy test 

20 surface above, and a second surface; described as the disease test surf ace or iitfectibn 
teist surface, are prepared depending on the disease or infection to be diagnosed. The 
same processes of detectable hybridization to a first and sedbnd set of tiiese surfaces 
with the healtiiy control sample and diseasc^^ection sample are followed to provide 
the four above-^lescribed hybridization patterns, i.e., healthy control sanople witii 

is healthy test surface; healthy control sample with disease^ection test surfoce; 
diseaseAnfection sanq>le with healthy test surface; and diseaseAnfection sainple with 
diseaseAnfection test surface. 

The diagnosis of disease or infection is provided by comparing 
the four hybridization pattems. Substantial differences between the first and third 

30 hybridization pattems, respectively, and the second and fourth hybridization pattems, 
respectively, indicate the presence of the selected disease or infection in said animal. 
Substantial similarities in the first and third hybridization pattems and second and 
fourth hybridization pattems indicates the absence of disease or infection. 

A similar embodiment utilizes the single surface bearing both 

35 the healthy test surface defined oligonucleotide/polynucleotides and the 
diseaseAnfection test surface defined oligonucleotide/polynucleotides as described 
above. Parallel process steps as described above for detection of genes differentially 
expressed in disease and infected states are followed, resulting in a first hybridization 
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pattern (healthy 'dontrbl sample 'with single healthy and^disekse/irifectioa test saxiq)le) 
and a^second hyhridizatioh pattern (diseajieAnfection sample with wother obpy^ of the 
single hedthy and diieaseAnfection test sanqjle^^ : j ^ ^ ..r .; mv-r nt^n: 

: ^ ^ Diagnosis' is accomijlished- by ocmparing 
5 patttems; wherein substantial differences between the first and second hybridization 
patterns indicate the presence of the selected disease or infection in the animal being 
tested. ^Substahiially sinular first' and s^ indicate the 

absence of disease or infection. This like many of the foregoing embodiments may 
use known or unknown ESTs derived from many libraries, 
io ^ Cr' ' '^^^ ^ i i v. y, h-u > M 

As is obvious to one of skill in the art upon reading this 
disclosiTO, the cothpdsitions an^dinethods of iMi^^ invention may alsb be osed for other 
sdizular pinpbses; may be 

adapted easily by manipulation of (he^ samples selected' to provide die standardized 

IS defined oligonudeotide^lynucleoddeSi and selection of < the samples ^selected for 
hybridization ^thereto. One such modification is the iise of this invention to identify 
cell markers of any type, e;g.; markers of cancer cells, stem cell markers, and the Iike.^> 
Another modification involveis die use of the method and compositions to generate 
hybridization patterns useful for forensic identificaticm or an 'expression fingerpriiif 

20 of genes for identification of one member of a species from another. Similarly, the 
methods of this invention may . be adapted for use in tissue matching for 
transplantation purposes as well as fc^ molecular histology, Le., to enable diagnosis of 
disease or disorders.in pathology tissue> samples such as biopsies. . Still another use of 
tius method is in' monitoring the effects of development and aging upon the gene 

25 esqncssicm in a . selected animal, by preparing surfaces .bearing 
oligonucleotide/polynucleotides prepared from samples of standardized younger 
members of the species being tested. Additionally the patient can serve as an internal 
control by virtue of having the method applied, to blood samples every 5-10 years 
during his lifetime. 

30 Still another intriguing use of this method is in the area of 

monitoring the effects of drugs on gene expression, both in laboratories, and during 
. clinical trials with animal, especially humans. Because the method can be readily 
adapted by altering the above parameters, it can essentially be employed to identify - 
differentially expressed genes of any organism, at any stage of development, and 

35 under the infiuence of any factor which can affect gene expression. 
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IV. The Genes and Proteins Identified - ' i: ^ Vu?; , . ; 

-.. ■ '-AiqiKcaWdn of boirqwsitidns iid 

above described alito provide othCT cbmpositibris, such as ' any "isolated' gdie s^^ 
which i^ diffcrentiiOly wqrcssed 'betwc^^ and 'in aniiiiai 

5 ' having a disease or iiifecidoh. Another embodiment of this invention is any isolated 
pathogen gene sequence ^xiuch^ is expressed in tissue or t»ll samptes df an infected 
animal; Similariy'an embodiment of this invehtibh is my g«i^ 
themethbdsdesciibed'hereiri:' ■•■ ■ ^ /Vi' ■■^■•<■ '■>'■ ."■•=v f- .'--^r'i ^vi ; v i-xiv 

These grae seqiiehdeis may be en^loyiea m conwiiiiSdnil tneihbd^ 'fo 
10 produce isolated proteins encoded thrieby: To producie a i^otein of ^ inveiitiOT, 
the DNA sequences of a dciared gene idciitified by die mefiibds^ df "ffiis 

invention or portions tlicjtof We mMd ^intb fi^ sui^ble^'^ej^saoii^^^^ 
Desirably; a - ' reoomWnant ' inblecule ot ' vdctra- " is ' amstiuStecl in which die 
polynucleotide sequence encoding the protein is operably linked to a heterologous 
15 cxpressioh control sequence permitting cxircssibn of the human protein? 

types of s^propriate expression vectors and host ceU systems' are known in the art for 
mammalian including human) expression, insect, c.g.. baculovinis c»cjpi«ssibn, yeast,^ 
fungal, and bacteirial expression, by standard nK>lecular biology techniques. 

The transfection of these vectors into appn^[>riate host cells, whether 
20 mammalian, bacterial, fungal, or insect, or iiito appropriate Sohises, can result in 
expression of the selected proteins. Suitable host cells or cell lines for tianrfection, 
and virases, as well as meAods for the construction and transfection oJ such host cells 

^ '""4* 

and viruses aie well-known. Suitable methods for transfection, culture, amplification^ 
screening, and product production and purification are also known in the art 

25 The genes and proteins identified by this invention can be employed^ if 

deshned in diagnostic compositions useful for the diagnosis of a disease or infection 
using conventional diagnostic assays. For example, a diagnostic rcagwit can be 
developed which detectably targets a gene sequence or protein of this invention in a 
biological sample of an animal. Such a reagent may be a complementary nucleotide 

30 sequence, ah antibody (monoclonal, recombinant or polyclonal), or a chemically 
derived agonist or antagonist. Alternatively, the proteins and polynucleotide 
sequeiides of this invention, fragments of same, or complementary sequences thereto, 
may themselves be useful as diagnostic reagents for diagnosing disease states with 
which the ESTs of the invention are associated. These reagents may optionally be 

35 labelled using diagnostic labels, such as radioactive labels, colorimetric enzyme label 
systems and the like conventionally used in diagnostic or therapeutic methods, e.g. 
Northern and Western blotting, antigen-antibody binding and the like. The selection 
of the appropriate assay format and label system is within the skill of the art and may 

19 
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leadily be chosen without Tequiring-addidona^ expla^^ by. resort to the wealth of 

'-art'in thcdiagnostic area. ■ > > ^^: ■ c ■ ^ . .i - x n ^- >. > r- a : • ' ; : . \ ! - » » o , • .'^ ; ' i> 

iir- » r V . AdditionaUy^rgenes and pTOteins.ide^ accoxdmg to this inye^tion 
»may be'used therapeuticsdly^r lk)rj^ sequences may 

be useful in gene therapy, to provide a gene sequence which in a disease is not 
properly or sufficiently expressed In such a,method,,a:selected;gene sequence pf ^diis 
invention is introduced into a suitable vector or pther. delivery system for delivery to a 
ceU containing a defect^iniithe selected ^gene. Suitable, dcflive^ systenis are well 
known to those of skiU.in -the.art and enable .the. genej.tO;.be 
\10 incorporated into the target; cell and to be translated by the xell.. .Ilie EST or gene 
sequence may be introduced to mutate the existing gene by recombmadon, or prpyi4e 
;an: acdve copy thereof in addition to the inactive grae ,tp replace ^i;s function. . , ; , 7 

li- M- Alternatively, a prc^ein encoded by; an EST ,or gene of the ioventipn 
:may be. useful as a therapeutic reagent for delivery of a biologically active protein, 
IS particularly when the.disease state is associated with a defidrai^ of diis protein. 
nSudi a protein may be incorporatpd^iritp an impropriate therapeutic formulatioii, alone 
^ ar in combination with other active ingredients. Methods of formulating such 
,^therq)eutic compositions, as well as, suitable phsumaceutical carriers, and the like, are 
well, known to those of skill in the art. Still an additional method of delivering the 
20 missing protein encoded by an EST*, or, the gerie froin which £^ selected EST was 
derived, involves repressing it directly in vivo. Systems for such in vivo expression 
are well known in the art . 

, Yet another use of the ESTs, genes identified according to the methods 
iOf this invention, or the proteins encoded thereby is a target for the screening and 
25 development of natural or synthetic chemical con^unds which have utility as 
ther^utic drugs for the treatment of disease states associated with the identified 
:genes and ESTs derived therefrom. ^As one example, a compouiid capable of binding 
to such a protein encoded by such a gene and either preventing or enhancing its 
.biological activity may be a useful drug component for the treatment or prevention of 
30 such disease states. 

Conventional assays and techniques may be used for the screening and 
; development of such drugs. As one example, a method for identifying compounds 
which specifically bind to or inhibit or activate proteins encoded by these gene 
sequences can include simply the steps of contacting a selected protein or gene 
35 product, with a test compound to permit binding of the test compound to the protein; 
and determining the amount of test compound, if any, which is bound to the protein. 
Such a method may involve the incubation of the test compound and the protein 
immobilized on a solid support. Still other conventional methods of drug screening 

20 
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can involve enq>loying a suitable jcomputo' program to determine compounds having 
similar or complementaiy chemical structures to that of the gene pixxiuct or portions 
thereof and screening those coiixpouxuis dther fcnr cotnp^titive^ b^ to the protein 
i to detect enhanced or decreased' activi^ iii the presence' of the selected compouhdi ^ 
5 r i Thus, throughxuse of such methods;' the present invention is anticipated 

>ito pronde compouiids c{q>able <of interacting with these genes, 'ESTs; xxr encoded 
:protdns, < or firagments , thereof, : and either enhancing or * decreasing >the biological 
; activity, as desiiied Such con^unds are believed to * be enconq>assed by this 
-invcnticMi.^/* v' : j^v'* J v^tl vrr^.:;; v ; ^'v: ^tU'* '-/t:). 'v ^' i*' 

10 Numerous modifications and variations of the present invention are 

included in die aboverridentified specification and are expected to be obvious to one of 
skill in the art- Such modificatioiis and alterations to the compositions . arid processes 
of die present invention are believed to be encompassed in the^ scope of die clahns 
'a|q)ended' hereto; ■^■ ^ ^ -h^:^^-^-^:'^ . * i--'. "^^^^p--..:- 
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WHATIS:CLAIMED-IS:. . .- v^;.:. r^'h.o:^^.,,: c;:).r:s->^ / 

-■'i' r-. 1. A method for identifying genes which are differentially expressed in 
two different pre-determined states of an organism coniprising: 
5' ^ a. providing a first surface on which is immobilized at (pre-defined; 

regions ^on said' surface ' a plurality m of defined < oligonuclcotide^lynucleodde 
sequences, each sequence selected fiom the group consisting of a firagment of an EST, 
an entire EST a'firagm»t of a gene or an entire s^e, isolated from; a DNA lifarary 
prepared fit>m' at least' cHie sdected cell, t^ or .organism sample^. in a first: 

10 state and present in excess relative to the polynucleotide to be hybridized; 

b. providing a second surface on which is immobilized at pre-defined 
regions on said :surface J a plurality ^of defined cligonucleotide/polynucleotide 
sequences, each sequence selected fiom- the ^up consisting of a fiugment of an EST,/ 
an entire EST^a fiagipent of a gene orr.an. entire gene, isolated £rom a DNA library 

15 prepared from at least one selected cell, tissue, organ or organism sample in a second 
state and present in excess relative to the polynucleotide to be hybridized; 

c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a saixiple from a said organism in said first 
state, said sample selected from sources analogous to the sources, of step (a), s^id 

20 hybridization sufficient to form a first and second hybridization pattern on each said 
first and second surface, 

d. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated £rom a saxnple from said organism in ^d second 
state, said san^>le selected fiom sources analogous to the sources of step (c), said 

25 hybridization sufficient to form a third and fourtii hybridization pattern cm each said 
first and second surface, 

e. comparing at least two of the four hybridization patterns, 
wherein genes differentially expressed in said first and second states are identified by 
the presence of differences in the hybridization patterns at jnie-defined regions; 

30 f . identifying the oligonucleotide^olynucleotides on each surface 
which correspond to said pattern differences and the corresponding ESTs or larger 
gene fragment fix>m which the oligonucleotide^lynucleotides were obtained, 
whereby identification of the EST or larger gene fragment permits identification of 
the gene from which the ESTs or larger gene firagment were derived. 

35 
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2. The method according to Claim* 1 wherein said first wd secohd states iaie 
fiespectively^healfliy and disease; pathogen uninfected im^ 
^progression state aiid a' second progression of a disease -or ififcctibrfr k firrt tiea^ 
^state^and -a second treatment s^ of or ixifectiohfor a fim developm^tal 

5 anfdiaseamdjdevel^^ - v vj,. v : r> 

pii\v; V 3/>Trhe method acci^^ i* whereih said Organism is a pla^t miin 

10 v i 4! 'The n^thodabconling^^b 

^ ' - 5. ' A method for idehtifjdrig genes which are diffeiehtid^^ 
normal healthy animal aihd an animal having a disease ccriniiising: ^ - ' 

a. ' pix^ 

IS defined regions on' said surface a pluiality' of defined bUgonucieotide/pblynucieotid^ 
sequences, each sequence each sequence selected from the group consisting of a 

fragment of an EST, an entire EST a fragment of a gene or an entire gene, isolated 

' . . ■,. -.*.. 

finom a DNA library prepared from at least one selected cell, tissue, organ or organism 
sample in a healthy animal and present in excess relative to the polynucleotide to be 
20 hybridized; ' ; v;/'-^. J - r 

b* providing a second surface on which is inunobilized at pre- 
defined regions of said surface a pluraliQr of defined bligonucleotide/polyhucleotide 
sequences, each sequence each sequence selected from the group 6ohsistiiig of a 
fiagment of an EST, an entire EST a fragment of a gene or an entire gene, isolated 

25 from a DNA library prepared from at least one selected cell, tissue, organ or organism 
sample from an animal having siaid disease and present in excess relative to the 
polynucleotide to be hybridized; 

c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated fix>m a sample from a healthy aiiimal, said sample 

30 selected from sources analogous to the sources of step (a), said hybridization 
sufficient to form a first and second hybridization pattern on each said first and 
second surface, said sample selected from a cell or tissue satnple analogous to the 
sample of step (a), said hybridization sufficient to form a first and second 
hybridization pattern on each said first and second surface; 
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d; . ' detiectably hybridizing to a set of said first and second suxfiaces 
polynucleotide sequences isolated from a sample from an animal having said disease, 
said'satnple selected^^ or^tissue sample ianalogous to the sample of step (c), 

^said hyMidization sufficient to fomi a third and fouitii hybridization pattern on each 
5 said first and second surf ace, 

e. comparing at least two of the four hybridization patterns, 
'wherein genes differentiaU said first and second states are lidendfied^by 

the presence of differences in the hybridization pattons at pre-defined regicms; 

10 ; which correspond to said pattem differences and the corresponding isSTs or larger 
gene fragment from which the oligonucleotide^lynucleoddes were obtained, 

tiereby identific^tioh.of the EST^or latter gene^fragment permits iden^ 
the gene fix>m which the ESTs or larger grae fragrnent were derived^ 

15 6. A.methbd for identifying genes whidi.are differentially exinessed in a 

normal healthy animal and an aitimal haviiig a disease ccmqmsing:' < 

a. providing a surface on which is imniohilized at pre-defined 
^ regions on said.: surf ace a plurality of defined; oligonucleotide/polynucleotide 
: sequences, each sequence selected from the group consisting of a fragment of an EST, 
20 ' an entire EST a fragment of a gene or an entirb gene isolated from a DNA library 
prqpared from the group selected from at least one selected cell, tissue, organ or 
: organism san^le in of a healthy aninoal and an analogous selected sanq>le of an 
; aiumal having said disease and both presdit in excess relative to the polynucleotide to 
be hybridized; 

■ . ■ ' _ i i 

25 b. detectably hybridizing to a first copy of said surface 

polynucleotide sequences isolated from a healthy animal, said sample selected fiiom a 
cell or tissue sample analogous to the sample of step (a), said hybridization sufficient 
to form a first hybridization pattem on said surface; 

c. detectably hybridizing to a second copy of said surface 
30.: polynucleotide sequences isolated from an animal having said disease, said sample 

selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a second hybridization pattem on said surface; 

d. comparing the two hybridization patterns, wherein genes 
differentially expressed in a disease state are identified by the presence of differences 

35 in the hybridization patterns at pre-defined regions; 
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:e; , cc identifying the ;oIigQimcleotide;^lynucle^^ on each surface 
which conespbz^^ to said pamam differences and die coneqwnding ESTs fnxa which 
tl|e oligonucleotid^)olynucleotides -are obtained; whereby identification of tfie EST 
P«™i» identificjatipn ,pf the, gene firom whichtiie ESTs wercsderived- : 



i A mediod for identifvina T' 



A mediod for identifying' a-g^ of a padiogeo which is expressed in a 
biological sample of an ammal infected vdth isiaid pathogen comprising: 

a. providing a first sorfiace on which is immobilized at me- 



1 — 

defined regions on said surface a plurality of defined oligonucleotide^lynucleotide 
sequences, each sequence selected ftom the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism isample of a 
h^tiiy, uninfected animal and present in excess relative to die polynucleotide to be 
hybridized; 

15 b. providing a second surface on which is immobilized at pre- 

defined regions of said sui&ce a plurality of defined oligonucleotide^lynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gaie isolated from at least one 
selected cell, tissue, organ or organism sample of an infected animal; 

20 c. detectably hybridizing to a set of said first and second surfaces 

polynucleotide sequences isolated from a sasqile from a healtiiy animal, said sample 
selected fix>m a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form first and second hybridization patterns on each said 
first and second surface, 

25 d. detectably hybridizing to a set of said first and second surfaces 

polynucleotide sequences isolated from a sample from an infected animal, said 
sample selected from a cell or tissue sample analogous to die sample of step (a), said 
hybridization sufficient to form third and fomth hybridization patterns on each said 
first and second surface, 

30 e. comparing the four hybridization patterns, wherein genes of 

said pathogen which are expressed in an infected animal are identified by the 
presence of differences in the hybridization patterns at pre-defined regions; 

f. identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pauem differences and the corresponding ESTs from which 

35 the oligonucleotide/polynucleotides are obtained, whereby identification of die EST 
permits identification of the gene from which the ESTs were derived. 
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8. A method for identifying a gene of a pathogen which is expressed in a 
biological san^le of an animal infected with said pathogen conqnising: 

a. providing a surface on which is immobilized at pre-defined 
regions on said surface a plurality of defined oligonucleotide^lynucleotide 

5 sequences* each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from the group selected from at least one selected cell, tissue, organ or ^ 
organism sanqile in of a healthy animal and an analogous selected saixq)le of an 
animal having said disease and both present in excess relative to the polynucleotide to 
10 be hybridized 

b. detectably hybridizing to a first copy of said surface 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a)» said 
hybridization sufficient to form a first hybridization pattern on said surface; 

15 c. detectably hybridizing to a second copy of said surface 

polynucleotide sequences isolated from a sample fix>m an infected animal, said 
sample selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a second hybridization pattern on said surface; 

d con^aring the two hybridization patterns, wherein genes of 

20 said pathogen which are expressed in an infected animal are identified by the 
presence of differences in the hybridization patterns at pre-defined regions; 

e. identifying the oligonucleotide/polynucleotides on each surface 
which coirespond to said pattern differences and the corresponding ESTs from which 
the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 

25 permits identification of the gene from which the ESTs were derived. 

9. A composition suitable for use in hybridization conqjrising a solid 
surface on which is immobilized at pre-defined regions on said surface a plurality of 
defined oligonucleotide^lynucleotide sequences for hybridization, each sequence 

30 selected from the group consisting of a fragment of an EST, an entire EST a fiagment 
of a gene or an entire gene isolated from a DNA library prepared from the group 
selected from at least one selected cell, tissue, organ or organism sample of a healthy 
animal, at least one analogous sample of said animal having a disease, at least one 
analogous sample of said animal infected with a microbial pathogen, and any 

35 combination thereof. 
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- : 10. An isolated gene sequence which is differentially expressed, in a 
V % nqmal healthy animal and an animal having a disease, identified by die metiiod of 



15 



30 



claim 1. , ' * ^■ 

■ • • - - - - ~ ■• ^ = r • - 



V < ^ : ^ ^ r^^' ^ parogen gene sequrac* is expressed in & 

cell saiiq)les of an infected animal identified by the method of claim 7. 



12. A diagnostic conaposition usiefioir f^ 
iprising a reagent capable of detectably targeting a gene sequence of claim 10 in a 
10 Inological sample of an animal. 



13. A diagnostic con^sition useful for the diagnosis of infecticm by a 
pathc^ amqmsing r 1^ i^aide of detectably targeting a g«ie sequence of 
claim 11 in a biological san^le of ah animal. 



14. An isolated protein produced by expression of a gene sequence of 
claim 10. 



15. An isolated pathogen prptein produced by expression of a gene 
20 sequence of claim 1 1. 

16. A therapeutic composition comprising a protein or fragment thereof 
selected from the group consisting of a jnrotein of claim 10 and a protein of claim 15; ; 



25 17. A method for diagnosing a selected disease or infection in an animal 

comprising: 

a. providing a first surface on which is immobilized at pre- 
defined regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene, isolated from a DNA library 
prq)ared from at least one selected cell, tissue, organ or organism sample of a healthy 
animal and present in excess relative to the polynucleotide to be hybridized; 

b. providing a second surface on which is immobilized at pre- 
defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 

35 sequences, each sequence comprising a fragment of an EST isolated from at least one 
said tissue of an animal having said disease; 
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c. detectably hyhridizing to a set of said first and second smfaces 
polynucleotide sequences isolated from a DNA libraiy prepared from a sample from a 
healthy animal, said sample selected from a cell or tissue sample analogous to the 
sample of step (a), said hybridization sufficient to form a first and second 

5 hybridization pattern on each said first and second surface; 

d. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a DNA library prepared from a sample from 
an animal having said disease, said sanxple selected from a cell or tissue san^le 
analogous to the sample of step (c), said hybridization sufficient to form a third and 

10 fourth hybridization pattern on each said first and second surface; 

e. comparing the four hybridization patterns, herein substantial 
differences between the first and third hybridization patterns and the second and 
fourth hybridization patterns indicates the presence of said selected disease or 
infection in said aiumal, and substantial similarities in said first and third 

IS hybridization patterns and second and fourth hybridization patterns indicates the 
absence of disease or infection. 

18. A method for diagnosing a selected disease or infection in an animal 
comprising: 

20 a. providing a surface on which is immobilized at pre-defined 

legicms on said surface a plurality of ddined oligonucleotide^lynucleotide 
sequences, each sequence conqirising a fragment of an EST isolated from a DNA 
library prepared from the group consisting of a selected cell or tissue sample of a 
healthy animal and an analogous selected cell or tissue sample of an animal having 

25 saiddisease; 

b. detectably hybridLring to a first copy of said surface 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a first hybridization panem on said surface; 

30 c. detectably hybridizing to a second copy of said surface 

polynucleotide sequences isolated from a DNA library prepared from a sample from 
an animal having said disease, said sample selected from a cell or tissue sample 
analogous to the sample of step (a), said hybridization sufficient to form a second 
hybridization pattern on said surface; 

35 d. comparing the two hybridization patterns, wherein substantial 

differences between the first and second hybridization patterns indicates the presence 
of said selected disease or infection in said animal, and substantial similarities in said 
first and second hybridization patterns indicates the absence of disease or infection. 
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